NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Extending Energy-Efficient and Scalable DNN Training and Inference with 3D Photonic Accelerator

https://doi.org/10.1109/JETCAS.2025.3591812

Curry, Juliana; Li, Yuan; Louri, Ahmed; Karanth, Avinash; Bunescu, Razvan (January 2025, IEEE Journal on Emerging and Selected Topics in Circuits and Systems)

Full Text Available
PCM Enabled Low-Power Photonic Accelerator for Inference and Training on Edge Devices

https://doi.org/10.1109/IPDPSW63119.2024.00118

Curry, Juliana; Louri, Ahmed; Karanth, Avinash; Bunescu, Razvan (July 2024, IEEE)

The convergence of edge computing and artificial intelligence requires that inference is performed on-device to provide rapid response with low latency and high accuracy without transferring large amounts of data to the cloud. However, power and size limitations make it challenging for electrical accelerators to support both inference and training for large neural network models. To this end, we propose Trident, a low-power photonic accelerator that combines the benefits of phase change material (PCM) and photonics to implement both inference and training in one unified architecture. Emerging silicon photonics has the potential to exploit the parallelism of neural network models, reduce power consumption and provide high bandwidth density via wavelength division multiplexing, making photonics an ideal candidate for on-device training and inference. As PCM is reconfigurable and non-volatile, we utilize it for two distinct purposes: (i) to maintain resonant wavelength without expensive electrical or thermal heaters, and (ii) to implement non-linear activation function, which eliminates the need to move data between memory and compute units. This multi-purpose use of PCM is shown to lead to significant reduction in energy consumption and execution time. Compared to photonic accelerators DEAP-CNN, CrossLight, and PIXEL, Trident improves energy efficiency by up to 43% and latency by up to 150% on average. Compared to electronic edge AI accelerators Google Coral which utilizes the Google Edge TPU and Bearkey TB96-AI, Trident improves energy efficiency by 11% and 93% respectively. While NVIDIA AGX Xavier is more energy efficient, the reduced data movement and GST activation of Trident reduce latency by 107% on average compared to the NVIDIA accelerator. When compared to the Google Coral and the Bearkey TB96-AI, Trident reduces latency by 1413% and 595% on average.
more » « less
Full Text Available
PCM Enabled Low-Power Photonic Accelerator for Inference and Training on Edge Devices

Curry, Juilana; Louri, Ahmed; Karanth, Avinash; Bunescu, Razvan (May 2024, IEEE)

Full Text Available
Flumen: Dynamic Processing in the Photonic Interconnect

https://doi.org/10.1145/3579371.3589110

Shiflett, Kyle; Karanth, Avinash; Bunescu, Razvan; Louri, Ahmed (June 2023, ACM/IEEE International Symposium on Computer Architecture (ISCA))

Full Text Available
Bitwise Neural Network Acceleration Using Silicon Photonics

https://doi.org/10.1145/3453688.3461515

Shiflett, Kyle; Karanth, Avinash; Louri, Ahmed; Bunescu, Razvan (June 2021, GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSI)

Full Text Available
Albireo: Energy-Efficient Acceleration of Convolutional Neural Networks via Silicon Photonics

https://doi.org/10.1109/ISCA52012.2021.00072

Shiflett, Kyle; Karanth, Avinash; Bunescu, Razvan; Louri, Ahmed (June 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA))

Full Text Available
GCNAX: A Flexible and Energy-efficient Accelerator for Graph Convolutional Neural Networks

https://doi.org/10.1109/HPCA51647.2021.00070

Li, Jiajun; Louri, Ahmed; Karanth, Avinash; Bunescu, Razvan (February 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA))
null (Ed.)
Full Text Available
CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters

https://doi.org/10.1109/HPCA51647.2021.00058

Li, Jiajun; Louri, Ahmed; Karanth, Avinash; Bunescu, Razvan (February 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA))
null (Ed.)
Full Text Available
Hardware-Level Thread Migration to Reduce On-Chip Data Movement Via Reinforcement Learning

https://doi.org/10.1109/TCAD.2020.3012650

Fettes, Quintin; Karanth, Avinash; Bunescu, Razvan; Louri, Ahmed; Shiflett, Kyle (November 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)
null (Ed.)
Full Text Available
IntelliNoC: a holistic design framework for energy-efficient and reliable on-chip communication for manycores

https://doi.org/10.1145/3307650.3322274

Wang, Ke; Louri, Ahmed; Karanth, Avinash; Bunescu, Razvan (June 2019, the 46th International Symposium on Computer Architecture)

As technology scales, Network-on-Chips (NoCs), currently being used for on-chip communication in manycore architectures, face several problems including high network latency, excessive power consumption, and low reliability. Simultaneously addressing these problems is proving to be difficult due to the explosion of the design space and the complexity of handling many trade-offs. In this paper, we propose IntelliNoC, an intelligent NoC design framework which introduces architectural innovations and uses reinforcement learning to manage the design complexity and simultaneously optimize performance, energy-efficiency, and reliability in a holistic manner. IntelliNoC integrates three NoC architectural techniques: (1) multifunction adaptive channels (MFACs) to improve energy-efficiency; (2) adaptive error detection/correction and re-transmission control to enhance reliability; and (3) a stress-relaxing bypass feature which dynamically powers off NoC components to prevent overheating and fatigue. To handle the complex dynamic interactions induced by these techniques, we train a dynamic control policy using Q-learning, with the goal of providing improved fault-tolerance and performance while reducing power consumption and area overhead. Simulation using PARSEC benchmarks shows that our proposed IntelliNoC design improves energy-efficiency by 67% and mean-time-to-failure (MTTF) by 77%, and decreases end-to-end packet latency by 32% and area requirements by 25% over baseline NoC architecture.
more » « less
Full Text Available

« Prev Next »

Search for: All records